Stress Echo Descriptive Analysis

Author

Frank Harrell

Published

August 24, 2022

Introduction

This report reads an existing R dataset (officially a “data frame”) and makes various descriptive summaries, packaging text, code, and output into an html file. Some of the graphics are rendered both as static and as interactive graphs. The interactive ones allow you to pan, zoom, and hover with the mouse to see more details. Interactive graphics are constructed using the R plotly package, which the Hmisc package loads automatically. To run this report you’ll need to install the Hmisc and plotly packages and their dependencies (RStudio will take care of the dependencies).

Loading User-Contributed R Packages

In R a package is a collection of functions, function documentation (“help files”), and example datasets. When you install R and RStudio many standard packages are automatically installed. There are over 10,000 user-contributed packages that extend what R can do. These include specialty packages for biomedical research (e.g., flow cytometry analysis, genomics, interfaces to REDCap) and a huge number of packages for analysis, table making, and graphics. After using the RStudio Tools menu to install add-on packages, you load packages into R to make them available when running your report by using require or library commands. In the following we get access to the Hmisc package.

Code
require(Hmisc)   # Hmisc must already be installed

If you installed Hmisc after 2022-08-15 you can load this complete report script into your RStudio script editor window by running the command (usually from the R console under RStudio) getRs('stressecho.qmd', put='rstudio').

Accessing Data

There are many ways to import data into R, as described here. Much of the time you will be importing .csv or REDCap files or binary files created by Stata, SPSS, or SAS. For learning R there is a wide variety of ready-to-use datasets on the Vanderbilt Department of Biostatistics dataset repository at hbiostat.org/data. These datasets are already in R binary format and many of them are fully documented and annotated. In the Hmisc package there is a function getHdata that finds these datasets, downloads them from the web server, and loads them into R’s memory for easy access.

Use thegetHdata function to download the stress echocardiogram dataset and read it into R’s memory. This dataset is annotated with variable labels and, for continuous variables, units of measurement. These annotations are used by contents, describe, and certain graphics functions.

Sometimes datasets have long names that are tedious to type. I use a convention of copying the currently active dataset into an R data frame called d to save typing during analysis.

Code
getHdata(stressEcho)
d <- stressEcho   # copy dataset so can refer to short name d

Data Dictionary

The names of variables in the stress echo dataset can be printed by using the command names(d). But let’s use an Hmisc package function to provide more metadata1

  • 1 results='asis' in the chunk header prepares R to expect specially formatted html output to be created by the html function.

  • Code
    html(contents(d))
    d Contents

    Data frame:d

    558 observations and 31 variables, maximum # NAs:0  
    NameLabelsUnitsLevelsStorage
    bhrBasal heart ratebpminteger
    basebpBasal blood pressuremmHginteger
    basedpBasal Double Product bhr*basebpbpm*mmHginteger
    pkhrPeak heart ratemmHginteger
    sbpSystolic blood pressuremmHginteger
    dpDouble product pkhr*sbpbpm*mmHginteger
    doseDose of dobutamine givenmginteger
    maxhrMaximum heart ratebpminteger
    pctMphrPercent maximum predicted heart rate achieved%integer
    mbpMaximum blood pressuremmHginteger
    dpmaxdoDouble product on max dobutamine dosebpm*mmHginteger
    dobdoseDobutamine dose at max double productmginteger
    ageAgeyearsinteger
    gender2integer
    baseEFBaseline cardiac ejection fraction%integer
    dobEFEjection fraction on dobutamine%integer
    chestpainChest paininteger
    restwmaResting wall motion abnormality on echocardiograminteger
    posSEPositive stress echocardiograminteger
    newMINew myocardial infarctioninteger
    newPTCARecent angioplastyinteger
    newCABGRecent bypass surgeryinteger
    deathinteger
    hxofHTHistory of hypertensioninteger
    hxofDMHistory of diabetesinteger
    hxofCigHistory of smoking3integer
    hxofMIHistory of myocardial infarctioninteger
    hxofPTCAHistory of angioplastyinteger
    hxofCABGHistory of coronary artery bypass surgeryinteger
    any.eventDeath, newMI, newPTCA, or newCABGinteger
    ecgBaseline electrocardiogram diagnosis3integer

    Category Levels
    gender
  • male
  • female
  • hxofCig
  • heavy
  • moderate
  • non-smoker
  • ecg
  • normal
  • equivocal
  • MI
  • In the main output you’ll find that the number of levels for categorical variables (R factor variables) are highlighted. These are hyperlinks, and if you click on the number the pointer will jump to the list of levels below.

    Descriptive Statistics

    The Hmisc package describe function produces appropriate descriptive statistics for each variable in a data frame depending on the variable’s nature (binary, categorical, continuous, etc.). If you just type describe(d) you’ll get plain text output with no graphics. Running the output of describe through the html function converts it to html format which contains small graphics files to render high-resolution spike histograms for continuous variables.

    See this for definitions of some of the items in describe output.
    Code
    des <- describe(d)   # store results of describe() in des
    html(des)            # can also use html(describe(d))
    d Descriptives
    d

    31 Variables   558 Observations

    bhr: Basal heart rate bpm
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580690.99975.2916.57 54.0 58.0 64.0 74.0 84.0 95.3102.0
    lowest : 42 44 45 46 47 , highest: 108 115 116 127 210
    basebp: Basal blood pressure mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580940.998135.323.35104.0110.0120.0133.0150.0162.3170.1
    lowest : 85 88 90 97 98 , highest: 192 194 195 201 203
    basedp: Basal Double Product bhr*basebp bpm*mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    55804411101812813 6607 7200 8400 9792116631361014770
    lowest : 5000 5220 5280 5400 5460 , highest: 17604 17710 17748 21082 27300
    pkhr: Peak heart rate mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    55801051120.625.36 81.85 90.70106.25122.00135.00147.00155.15
    lowest : 52 61 62 63 66 , highest: 170 171 176 182 210
    sbp: Systolic blood pressure mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    55801421146.940.72 96102120141170200210
    lowest : 40 60 70 79 80 , highest: 240 250 274 283 309
    dp: Double product pkhr*sbp bpm*mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580508117634576510256113411403317060206442453626637
    lowest : 5100 5940 7490 8100 8360 , highest: 32518 33400 33840 38205 45114
    dose: Dose of dobutamine given mg
    image
    nmissingdistinctInfoMeanGmd
    558070.8433.758.334
    lowest : 10 15 20 25 30 , highest: 20 25 30 35 40
     Value         10    15    20    25    30    35    40
     Frequency      2    28    47    56    64    61   300
     Proportion 0.004 0.050 0.084 0.100 0.115 0.109 0.538
     

    maxhr: Maximum heart rate bpm
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    55801031119.424.64 82.0 91.0104.2120.0133.0146.0154.1
    lowest : 58 62 63 66 67 , highest: 170 171 176 182 200
    pctMphr: Percent maximum predicted heart rate achieved %
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580780.99978.5716.86 53 60 69 78 88 97104
    lowest : 38 39 40 41 42 , highest: 116 117 126 132 133
    mbp: Maximum blood pressure mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    55801320.99915635.03110.0120.0133.2150.0175.8200.0211.1
    lowest : 84 90 92 93 96 , highest: 240 250 274 283 309
    dpmaxdo: Double product on max dobutamine dose bpm*mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580484118550538511346128651526018118212392489327477
    lowest : 7130 8100 8360 9240 9280 , highest: 32518 33400 33840 38205 45114
    dobdose: Dobutamine dose at max double product mg
    image
    nmissingdistinctInfoMeanGmd
    558080.94130.2410.55
    lowest : 5 10 15 20 25 , highest: 20 25 30 35 40
     Value          5    10    15    20    25    30    35    40
     Frequency      7     7    55    73    71    78    62   205
     Proportion 0.013 0.013 0.099 0.131 0.127 0.140 0.111 0.367
     

    age: Age years
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580620.99967.3413.4146.8551.0060.0069.0075.0082.0085.00
    lowest : 26 28 29 30 33 , highest: 89 90 91 92 93
    gender
    nmissingdistinct
    55802
     Value        male female
     Frequency     220    338
     Proportion  0.394  0.606
     

    baseEF: Baseline cardiac ejection fraction %
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580540.99455.610.7132405257626566
    lowest : 20 21 22 23 25 , highest: 74 75 77 79 83
    dobEF: Ejection fraction on dobutamine %
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580600.99265.2412.3840.049.762.067.073.076.080.0
    lowest : 23 25 26 27 28 , highest: 86 87 89 90 94
    chestpain: Chest pain
    nmissingdistinctInfoSumMeanGmd
    558020.641720.30820.4272

    restwma: Resting wall motion abnormality on echocardiogram
    nmissingdistinctInfoSumMeanGmd
    558020.7452570.46060.4978

    posSE: Positive stress echocardiogram
    nmissingdistinctInfoSumMeanGmd
    558020.5531360.24370.3693

    newMI: New myocardial infarction
    nmissingdistinctInfoSumMeanGmd
    558020.143280.050180.09549

    newPTCA: Recent angioplasty
    nmissingdistinctInfoSumMeanGmd
    558020.138270.048390.09226

    newCABG: Recent bypass surgery
    nmissingdistinctInfoSumMeanGmd
    558020.167330.059140.1115

    death
    nmissingdistinctInfoSumMeanGmd
    558020.123240.043010.08247

    hxofHT: History of hypertension
    nmissingdistinctInfoSumMeanGmd
    558020.6253930.70430.4173

    hxofDM: History of diabetes
    nmissingdistinctInfoSumMeanGmd
    558020.6992060.36920.4666

    hxofCig: History of smoking
    image
    nmissingdistinct
    55803
     Value           heavy   moderate non-smoker
     Frequency         122        138        298
     Proportion      0.219      0.247      0.534
     

    hxofMI: History of myocardial infarction
    nmissingdistinctInfoSumMeanGmd
    558020.5991540.2760.4004

    hxofPTCA: History of angioplasty
    nmissingdistinctInfoSumMeanGmd
    558020.204410.073480.1364

    hxofCABG: History of coronary artery bypass surgery
    nmissingdistinctInfoSumMeanGmd
    558020.399880.15770.2661

    any.event: Death, newMI, newPTCA, or newCABG
    nmissingdistinctInfoSumMeanGmd
    558020.402890.15950.2686

    ecg: Baseline electrocardiogram diagnosis
    image
    nmissingdistinct
    55803
     Value         normal equivocal        MI
     Frequency        311       176        71
     Proportion     0.557     0.315     0.127
     

    describe also has a plot method that translates descriptive statistics to purely graphical form and draws larger high-resolution histograms for continuous variables. The plot method produces two graphics objects: one for categorical variables and one for continuous variables. We save the overall plot result in object p then reference two sub-objects2

  • 2 For these static graphics we could have just typed plot(des) and two plots would appear in succession. That approach does not work when making interactive graphs later.

  • Code
    p <- plot(des)
    Code
    p$Categorical

    Figure 1: Categorical variables

    Code
    p$Continuous

    Figure 2: Continuous variables

    To render these two graphs in interactive format using plotly we set a system option that the Hmisc package monitors: grType. It’s default value is 'plain'.

    Code
    options(grType='plotly')
    p <- plot(des)
    Code
    p$Categorical

    Figure 3: Categorical variables. Hover over points to see numerators and denominators of proportions, and hover over the leftmost point to see more information.

    Code
    p$Continuous

    Figure 4: Continuous variables. Hover over the leftmost spike to see a full list of descriptive statistics. Hover over other spikes to see \(x\)-axis values and frequency counts.

    Computing Environment

    A function session stored in an object in the Hmisc package prints the current computing environment in a nice format.

     R version 4.2.1 (2022-06-23)
     Platform: x86_64-pc-linux-gnu (64-bit)
     Running under: Pop!_OS 22.04 LTS
     
     Matrix products: default
     BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
     LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
     
     attached base packages:
     [1] stats     graphics  grDevices utils     datasets  methods   base     
     
     other attached packages:
     [1] Hmisc_4.7-2     ggplot2_3.3.5   Formula_1.2-4   survival_3.4-0 
     [5] lattice_0.20-45
     
    To cite R in publications use:

    R Core Team (2022). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

    To cite the Hmisc package in publications use:

    Harrell Jr F (2022). Hmisc: Harrell Miscellaneous. R package version 4.7-2, https://hbiostat.org/R/Hmisc/.

    To cite the ggplot2 package in publications use:

    Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.

    To cite the survival package in publications use:

    Therneau T (2022). A Package for Survival Analysis in R. R package version 3.4-0, https://CRAN.R-project.org/package=survival.